23 research outputs found

    Position Models and Language Modeling

    Full text link
    International audienceIn statistical language modelling the classic model used is nn-gram. This model is not able however to capture long term dependencies, \emph{i.e.} dependencies larger than nn. An alternative to this model is the probabilistic automaton. Unfortunately, it appears that preliminary experiments on the use of this model in language modelling is not yet competitive, partly because it tries to model too long term dependencies. We propose here to improve the use of this model by restricting the dependency to a more reasonable value. Experiments shows an improvement of 45\% reduction in the perplexity obtained on the Wall Street Journal language modeling task

    A Discriminative Model of Stochastic Edit Distance in the form of a Conditional Transducer

    No full text
    pages 240-252International audienceMany real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of \textit{context-sensitive} edit distances

    A note on conformal symmetry in projective superspace

    Get PDF
    We describe a sufficient condition for actions constructed in projective superspace to possess an SU(2) R-symmetry. We check directly that this condition implies that the corresponding hyperkahler varieties, constructed by means of the generalized Legendre transform, have a Swann bundle structure.Comment: 21 pages, added reference

    Efficient Pruning of Probabilistic Automata

    No full text
    International audienceApplications of probabilistic grammatical inference are limited due to time and space consuming constraints. In statistical language modeling, for example, large corpora are now available and lead to managing automata with millions of states. We propose in this article a method for pruning automata (when restricted to tree based structures) which is not only efficient (sub-quadratic) but that allows to dramatically reduce the size of the automaton with a small impact on the underlying distribution. Results are evaluated on a language modeling task

    Use of Grammatical Inference in Natural Speech Recognition

    No full text
    This paper presents the application of stochastic grammatical inference to speech recognition. In speech recognition, the acoustic signal process produces a set of words which are combinating to build sentences. Language models are then used to lead the speech recognition application to the most pertinent combination. Up to now, statistical language models are used. We suggest to use stochastic formal grammars instead of statistical models. Theses stochastic grammars will be build by machine learning algorithms. We will first show that unaided grammatical inference cannot be used for speech recognition. We will then make manifest that smoothing is necessary and show the gain that one can obtain by using a basic smoothing. We finally put up a smoothing technic dedicates to stochastic formal grammars. 2 THE QUALITY CRITERION 1 Introduction Our aim is to use stochastic grammatical inference for natural speech recognition. The main difference between validations of grammatical inference..

    Probabilistic finite-state machines - part II

    No full text

    IEEE TRANSACTION PAMI 1 Probabilistic Finite-State Machines – Part II

    No full text
    Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part II, we study the relations between probabilistic finite-state automata and other well known devices that generate strings like hidden Markov models and  -grams, and provide theorems, algorithms and properties that represent a current state of the art of these objects
    corecore